首页> 外文OA文献 >Low Overhead Memory Subsystem Design for a Multicore Parallel DSP Processor
【2h】

Low Overhead Memory Subsystem Design for a Multicore Parallel DSP Processor

机译:多核并行DSP处理器的低开销内存子系统设计

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The physical scaling following Moore’s law is saturated while the requirement on computing keeps growing. The gain from improving silicon technology is only the shrinking of the silicon area, and the speed-power scaling has almost stopped in the last two years. It calls for new parallel computing architectures and new parallel programming methods. Traditional ASIC (Application Specific Integrated Circuits) hardware has been used for acceleration of Digital Signal Processing (DSP) subsystems on SoC (System-on-Chip). Embedded systems become more complicated, and more functions, more applications, and more features must be integrated in one ASIC chip to follow up the market requirements. At the same time, the product lifetime of a SoC with ASIC has been much reduced because of the dynamic market. The life time of the design for a typical main chip in a mobile phone based on ASIC acceleration is about half a year and the NRE (Non-Recurring Engineering) cost of it can be much more than 50 million US$. The current situation calls for a new solution than that of ASIC. ASIP (Application Specific Instruction set Processor) offers comparable power consumption and silicon cost to ASICs. Its greatest advantage is the functional flexibility in a predefined application domain. ASIP based SoC enables software upgrading without changing hardware. Thus the product life time can be 5-10 times more than that of ASIC based SoC. This dissertation will present an ASIP based SoC, a new unified parallel DSP subsystem named ePUMA (embedded Parallel DSP Platform with Unique Memory Access), to target embedded signal processing in  communication and multimedia applications. The unified DSP subsystem can further reduce the hardware cost, especially the memory cost, of embedded SoC processors, and most importantly, provide full programmability for a wide range of DSP applications. The ePUMA processor is based on a master-slave heterogeneous multi-core architecture. One master core performs the central control, and multiple Single Instruction Multiple Data (SIMD) coprocessors work in parallel to offer a majority of the computing power. The focus and the main contribution of this thesis are on the memory subsystem design of ePUMA. The multi-core system uses a distributed memory architecture based on scratchpad memories and software controlled data movement. It is suitable for the data access properties of streaming applications and the kernel based multi-core computing model. The essential techniques include the conflict free access parallel memory architecture, the multi-layer interconnection network, the non-address stream data transfer, the transitioned memory buffers, and the lookup table based parallel memory addressing. The goal of the design is to minimize the hardware cost, simplify the software protocol for inter-processor communication, and increase the arithmetic computing efficiency. We have so far proved by applications that most DSP algorithms, such as filters, vector/matrix operations, transforms, and arithmetic functions, can achieve computing efficiency over 70% on the ePUMA platform. And the non-address stream network provides equivalent communication bandwidth by less than 30% implementation cost of a crossbar interconnection.
机译:在计算需求不断增长的同时,遵循摩尔定律的物理比例已达到饱和。改进硅技术的收益仅仅是硅面积的缩小,并且在过去两年中,速度-功率缩放几乎停止了。它要求新的并行计算体系结构和新的并行编程方法。传统的ASIC(专用集成电路)硬件已用于加速SoC(片上系统)上的数字信号处理(DSP)子系统。嵌入式系统变得更加复杂,必须在一个ASIC芯片中集成更多功能,更多应用和更多功能,才能满足市场需求。同时,由于市场动态,带有ASIC的SoC的产品寿命已大大缩短。基于ASIC加速功能的手机中典型主芯片设计的生命周期约为半年,其NRE(非重复工程)成本可能超过5000万美元。当前的状况要求一种比ASIC的新解决方案。 ASIP(专用指令集处理器)可提供与ASIC相当的功耗和芯片成本。它的最大优点是预定义应用程序域中的功能灵活性。基于ASIP的SoC无需更改硬件即可进行软件升级。因此,产品寿命可以是基于ASIC的SoC的5-10倍。本论文将介绍一种基于ASIP的SoC,一个新的统一并行DSP子系统,名为ePUMA(具有唯一内存访问功能的嵌入式并行DSP平台),以通信和多媒体应用中的嵌入式信号处理为目标。统一的DSP子系统可以进一步降低嵌入式SoC处理器的硬件成本,尤其是内存成本,最重要的是,可以为各种DSP应用程序提供完全的可编程性。 ePUMA处理器基于主从异构多核体系结构。一个主核执行中央控制,多个单指令多数据(SIMD)协处理器并行工作以提供大部分计算能力。本文的重点和主要贡献在于ePUMA的内存子系统设计。多核系统使用基于暂存器存储器和软件控制的数据移动的分布式存储器体系结构。它适用于流应用程序的数据访问属性和基于内核的多核计算模型。基本技术包括无冲突访问并行存储器体系结构,多层互连网络,非地址流数据传输,已转换的存储器缓冲区以及基于查找表的并行存储器寻址。设计的目的是使硬件成本最小化,简化处理器间通信的软件协议并提高算术计算效率。到目前为止,我们已经通过应用程序证明,大多数DSP算法(例如滤波器,矢量/矩阵运算,变换和算术函数)在ePUMA平台上可以实现超过70%的计算效率。而且,非地址流网络以不到30%的纵横互连的实现成本来提供等效的通信带宽。

著录项

  • 作者

    Wang, Jian;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号